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Outline 
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Introduction 


• Central to fault- tolerant computing is redundancy mange- 
ment. 

• Common to proofs of fault-tolerance is a maximum fault 
assumption. 

If there are m or fewer faults in the system, then . . . 

• Typically a maximum fault assumption is rather restric- 
tive. Usually, this is necessary to avoid assumptions about 
the behavior of faulty channels. 

— For Interactive consistency, in order to tolerate m 
faults, 3m + 1 nodes are required. 

— For a majority vote, 2 m + 1 channels are required. 

• A maximu m fault assumption is useful because it allows 
us to reason about fault tolerance in the presence of arbi- 
trarily malicious fault behavior. However, analysis of the 
architecture may establish certain scenarios in which the 
assumption may be weakened. 
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• Should fault-tolerant systems incorporate features which 
attempt to recover from failure combinations which exceed 
the maximum fault assumption? 

• If so, what is the proof obligation? 


• At the very least, it is necessary to show that existing 
proofs which depend upon the maximum fault assumption 
still hold. 
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Hypothetical Scenario 

Imagine that plurality voting circuit has been developed for use 
in a a four channel fault-tolerant computing system. Suppose 
that a designer is considering using this circuit in a system 
which depends upon a majority vote in order to maintain cor- 
rect system state. 

Can this voting circuit be used in this system? 
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First we define existence predicates for majority and plural- 
ity as follows: 

V B .majority _exists B = FINITE B A 3x.|B| < 2 1 J5 |a. 

VJ3 .plurality .exists B = 3x.Wx' .(x ^ x') D \B\ X > < \B\ X 

Where B is a bag^ |J3| represents its cardinality, and \B\ X 
represents the count of x in B. 


1 Essentially a bag is a set without absorption, [a, a, 6] = [6, a, a], but [a, b] [a, a, 6] 
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From these we define the following functions: 

VB .majority B = e x.\B\ < 2|Z?| X 

VJB .plurality B = e x.Vx' .(x 7 ^ x') D \B\ X > < \B\ X 
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The property we need to prove is 
VB. majority -exists B D ( majority B = plurality B). 
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The first step was to show that 

\/B .majority ^exists B D plurality ^exists B 

For this, we needed to prove the following lemma: 

VB. FINITE B D (Var y.(x ± y) D |B| y < (\B\ - |J3|,)) 

From this lemma, coupled with rewriting the right, conjunct 
of majority ^exists to 

3x.(\B\-\B\ x ) < \B\ X , 

and then using transitivity of ‘<’ and *<’ we can establish the 
existence of plurality from the existence of majority. 
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In order to show the equivalence between majority and plu- 
rality we needed to establish uniqueness from existence (i.e. 
if it exists then its unique). This allowed us to substitute in 
one side of the equation and then show that the chosen value 
satisfied the predicate embedded in the other . 2 


3 Thanks to Brian Graham of the University of Calgary for submitting his methods of 
dealing with the HOL choice operator (‘c ’ or ‘®’) to the info-hol mailing list. 
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Once this was done we looked at proving some other simple 
facts about voting which may be useful in the analysis of fault- 
tolerant architectures. Specifically, we proved the preservation 
of majority for a few common reconfiguration schemes. 

• Graceful Degradation 

• Perfect Spares 

• Imperfect Spares 

Of course, we neglected one of the more difficult aspects of 
reconfiguration, namely that of correctly identifying the faulty 
channel. All that we have done is prove a little bit of common 
sense. 
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Graceful Degradation 

The simplest reconfiguration strategy is graceful degradation. 
This consists of removing a faulty channel and continuing pro- 
cessing with one less channel of redundancy. The proof for 
this case showed that a majority is preserved if a non-majority 
element is removed from consideration. 

First we show existence 

V5.Vx. majority ^exists B D 
(x € B) 3 
( x ^ majority B) D 
majority^exists ( B — x ) 

This essentially reduces to showing 

\B\<2\B\ X 'D(\B\-1)<2\B\ X .. 

From existence we get uniqueness so we can then show 

VJ3.Vx. majority^exists B D 

{x eB)D 

(x ^ majority B) D 
( majority B — majority (B — x)) 
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Perfect Spares 

Sometimes, in addition to removing a faulty channel, a good 
channel is added to the configuration. To capture this scenario, 
we showed that the insertion of the majority element to a bag 
preserved both existence and value of the majority. 

\/B. majority ^exists B D 

majority-exists (( majority B) © B) 

\/B. majority ^exists B D 

( majority (( majority B) © B) = majority B) 
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Imperfect Spares 

Finally, recognizing that it is possible for spares to fail, it 
was shown that the removal of a non-majority (e.g. failed) el- 
ement coupled with the addition of an arbitrary element (of 
the proper type) also preserves both existence and the value of 
majority. 

VB. majority ^exists B 0 
Vx x'. (x G B) 0 

(x ^ majority B ) 0 
majority^exists (x' 0 (B — x)) 

\/B. majority .exists B 0 
Vx x' . (x E B) D 

(x ^ majority B) 0 

((majority (x 1 © (B - x))) = (majority B )) 
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Future Efforts 


• Establish a base for reasoning about error manifestations 
in order to reason about Fault Detection and Isolation. 

When can you conclude that a redundant channel is 
faulty? 

• Explore the effects that incorporating a plurality voter 
would have on the OS proofs. 

This would require adding assumptions concerning the 
behavior of faulty channels. 

• Explore possible ways to incorporate reconfiguration strate- 
gies into the OS effort. 

How do you differentiate between a permanent and a 
transient fault? 
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